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Abstract — The problem of characterizing impacts of worst 
data on real-time locational marginal price (LMP) is considered. 
Because the real-time LMP is computed from the estimated 
network topology and system state, bad data that cause errors 
in topology processing and state estimation affect real-time LMP. 
It is shown that the power system state space is partitioned 
into price regions of convex polytopes. Under different bad data 
models, the worst case impacts of bad data on real-time LMP 
is analyzed. Numerical simulations are used to illustrate worst 
case performance for IEEE-14 and IEEE-118 networks. 

Keywords -locational marginal price (LMP), real-time market, 
power system state estimation, bad data detection, cyber security 
of smart grid. 



I. Introduction 

THE deregulated electricity market has two interconnected 
components. The day-ahead market determines the loca- 
tional marginal price (LMP) based on the dual variables of the 
optimal power flow (OPF) solution given generator offers, 
demand forecast, system topology, and security constraints. 
The calculation of LMP in the day-ahead market does not 
depend on the actual system operation. In the real-time market, 
on the other hand, an ex-post formulation is often used {e.g., 
by PJM and ISO-New England pl) to calculate the real-time 
LMP by solving an incremental OPF problem. The LMPs in 
the day-ahead and the real-time markets are combined in the 
final clearing and settlement processes. 

The real-time LMP is a function of data collected by the 
supervisory control and data acquisition (SCADA) system. 
Therefore, anomalies in data, if undetected, will affect prices 
in the real-time market. While the control center employs 
a bad data detector to "clean" the real-time measurements, 
miss detections and false alarms will occur inevitably. The 
increasing reliance on the cyber system also comes with the 
risk that malicious data may be injected by an adversary to 
affect system and real-time market operations. An intelligent 
adversary can carefully design a data attack to avoid detection 
by the bad data detector 

Regardless the source of data errors, it is of significant value 
to assess potential impacts of data quality on the real-time 
market, especially when a smart grid may in the future deploy 
demand response based on real-time LMP. To this end, we 
are interested in characterizing the impact of worst case data 
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errors on the real-time LMP. The focus on the worst case also 
reflects the lack of an accurate model of bad data and our 
desire to include the possibility of data attacks. 

A. Summary of Results, Contexts, and Approaches 

We aim to characterize the effects of the worst data on 
real-time LMP. The complete characterization of worst data 
impact is not computationally tractable. Our goal here is to 
develop an optimization based approach to search for worst 
data and evaluate the effects of worst data by simulations. In 
characterizing the relation between data and real-time LMP, 
we first present a geometric characterization of the real-time 
LMP 

In particular, we show that the state space of the power 
system is partitioned into polytope price regions, as illustrated 
in Fig. [T] where each polytope is associated with a unique 
real-time LMP vector, and the price region X; is defined by a 
particular set of congested lines that determine the boundaries 
of the price region. 




Fig. 1: Geometric characterization of LMP 

Two types of bad data are considered in this paper One is 
the bad data associated with meter measurements such as the 
branch power flows in the network. Such bad data will cause 
errors in state estimation, possibly perturbing, as an example, 
the correct state estimate x in Xq to x in X3 (as shown in 
Fig. |2(a)| i. The analysis of the worst case data then corresponds 
to finding the worst measurement error such that it perturbs 
the correct state estimation to the worst price region. 

The second type of bad data, one that has not been carefully 
studied in the context of LMP in the literature, is error in 
digital measurements such as switch or breaker states. Such 
errors lead directly to topology errors therefore causing a 
change in the polytope structure as illustrated in Fig. |2(b)| 
In this case, even if the estimated system state changes little. 
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(a) Bad meter data 




(b) Topology error 



Fig. 2; Change of real-time LMPs due to bad data 



the prices associated with each region change, sometimes quite 
significantly. 

Before characterizing impacts of bad meter data on LMP, we 
need to construct appropriate models for bad data. To this end, 
we propose three increasingly more powerful bad data models 
based on the dependencies on real-time system measurements: 
state independent bad data, partially adaptive bad data, and 
fully adaptive bad data. 

In studying the worst case performance, we adopt a widely 
used approach that casts the problem as one involving an ad- 
versary whose goal is to make the system performance as poor 
as possible. By giving the adversary more information about 
the network state and endowing him with the ability to change 
data, we are able to capture the worst case performance, 
sometimes exactly and sometimes as bounds on performance. 
The approach of finding the worst data is therefore equivalent 
to finding the optimal strategy of an attacker who tries to 
perturb the real-time LMP and avoid being detected at the 
same time. To this end, we are able to formulate the problem 
of finding the optimal attack as certain convex optimizations. 

Thus, when we discuss "attacks" of an "adversary", we 
mean to use the notion of adversary as a proxy to construct 
worst data. In practice, however, it is not impossible that worst 
data obtained in this paper are results of a physically launched 
attack by someone who has the necessary system parameters 
and the ability to modify meter data before they reach the 
control center 

Finally, we perform simulation studies using several bench- 
mark networks: IEEE- 14 and IEEE-118 networks. We observe 
that bad data independent of the system state seems to have 



limited impact on real-time LMPs, and greater price perturba- 
tion can be achieved by state dependent bad data. The results 
also demonstrate that the real-time LMPs are subject to much 
larger perturbation if bad topology data are present in addition 
to bad meter data. While substantial price changes can be 
realized for small networks by the worst meter data, as the 
size of network grows while the measurement redundancy 
rate remains the same, the influence of worst meter data on 
LMP is reduced. However, larger system actually gives more 
possibilities for the bad topology data to perturb the real-time 
LMP more significantly. 

Our simulation results also show a degree of robustness 
provided by the nonlinear state estimator. While there have 
been many studies on data injection attacks based on DC 
models, very few consider the fact that the control center 
typically employs the nonlinear WLS state estimator under 
the AC model. Our simulation shows that the effect of bad 
analog data on LMP is significantly mitigated by the nonlinear 
estimator whereas bad topology data coupled with bad analog 
data can have significant impacts on LMP. 

B. Related Work 

Effects of bad data on power system have been studied 
extensively in the past, see fS), |]4l, ||5|. Finding the worst case 
bad data is naturally connected with the problem of malicious 
data. In this context, the results presented in this paper can be 
viewed as one of analyzing the impact of the worst (malicious) 
data attack. 

In a seminal paper by Liu, Ning, and Reiter |I6], the authors 
first illustrated the possibility that, by compromising enough 
number of meters, an adversary can perturb the state estimate 
arbitrarily in some subspace of the state space without being 
detected by any bad data detector. Such attacks are referred 
to as strong attacks. It was shown by Kosut et al. JT], El that 
the condition for the existence of such undetectable attacks 
is equivalent to the classical notion of network observability. 
The connection to network observability leads to a graph 
theoretic approach to characterizing the vulnerability of the 
power system by the so-called security index — the smallest 
set of attacked meters that will cause unobservability |I8]. 

When the adversary can only inject malicious data from a 
small number of meters, strong attacks do not exist, and any 
injected malicious data can be detected with some probability. 
Such attacks are referred to as weak attacks [0. In order 
to affect the system operation in some meaningful way, the 
adversary has to risk being detected by the control center The 
impacts of weak attack on power system are not well under- 
stood because the detection of such bad data is probabilistic. 
Our results are perhaps the first to quantify such impacts. Most 
related research works focus on DC model and linear estimator 
while only a few have address the nonlinearity effect ||9l. 

It is well recognized that bad data can also cause topology 
errors ifTOl . llTTI . |[T2l . and techniques have been developed to 
detect topology errors. For instance, the residue vector from 
state estimation was analyzed for topology error detection 
ifTTl . ifTol . iUi . Monticelh CHI first introduced the idea 
of generalized state estimation where, roughly speaking, the 
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topology that fits the meter measurements best is chosen as 
the topology estimate. Abur et al. flTSl and Mili et al. ||T6l 
extended the idea to various state estimation formulations. The 
impacts of topology errors on electricity market have not been 
reported in the literature, and this paper aims to bridge this 
gap- 

The effect of data quality on real-time market was first 
considered in ifTTl . ifTSl . In ifTSl . the authors presented the 
financial risks induced by the data perturbation and proposed 
a heuristic technique for finding a case where price change 
happens. While there are similarities between this paper 
and ifTSl . several significant differences exist: (i) This paper 
presents a worst case analysis. We focus on finding the worst 
case, not only a feasible case, (ii) This paper considers a 
more general class of bad data where bad data may depend 
dynamically on the actual system measurements whereas bad 
data considered in ifTSll are static, (iii) This paper considers a 
broader range of bad data that also include bad topology data, 
and our evaluations are based on the AC network model and 
the presence of nonlinear state estimator 



C. Organization and Notation 

This paper is organized as follows. Section briefly de- 
scribes a model of real-time LMP and introduces its geometric 
characterization in the state space of the power system. This 
geometric characterization is the key in guiding our search for 
the worst case real-time price perturbation. Section |lll] estab- 
lishes the bad data models and summarizes state estimation 
and bad data detection procedures at the control center In 
Section |IV] we first use a simple example to show how the 
price can be affected by bad meter data. Then a metric of 
impact on real-time LMP caused by bad meter data is intro- 
duced. We then discuss the algorithms of finding worst case 
bad meter data vector in terms of real-time price perturbation 
under the three different bad data models. Section IVjconsiders 
the effect of bad topology data on real-time LMP. Finally, in 
Section rvTl simulation results are presented based on IEEE- 14 
and IEEE- 11 8 networks. 

Notations used in this paper are standard. Vector x = 
(cci, . . . , x„) is a column vector. We use the convention that, 
if a; is a vector associated with some quantity of the system 
{e.g., system state), x is an estimate of x from real data. For 
reference, a list of global variables are given below. 



a 


vector added to meter data by adversary 


c 


vector of generation offers 


d 


vector of forecasted demands 


/, / 


vector of branch flows 




and its real-time estimate 


P* 


optimal day-ahead dispatch vector 


r 


measurement residue 


s 


transmission line connectivity measurements 


w 


measurement noise 




system state vector 




and its real-time estimate 


z 


vector of real-time measurements 


A 


set of feasible perturbations on meter data 



A 


3 

sensitivity matrix of branch flows with respect to 




power injections, also called PTDF 




(Power Transfer Distribution Factors) 


P P 
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and its rpal-timp pstimatp 


£ 


set of connected transmission lines between buses 


£a 


set of lines that the adversary aims to remove 


p 


t;pn*;iti VI tv matriY c\f hranph flow*; with rp*;nppt to *;tatp 


G 

J 




TJ 
11 


lllCtiaUlClilCIIL iiitlLllA Ul LilC J_-'v^ lilULlCl 


K 


T inear statp pstimation onprator 




vector of branch flow limits 


V 


set of buses 


x 


state space 


A* 


IVJ^^ClLiWIICll iiicll ^illdtl ^i_;iV J. IT J V C^- LVJl 


A 


real-time LMP vector 


T 


threshold of bad data detector 



II. Structures of Real-Time LMP 

In this section, we present first a model for the computation 
of real-time locational marginal price (LMP). While ISOs have 
somewhat different methods of computing real-time LMP, they 
share the same two-settlement architecture and similar ways 
of using real-time measurements. In the following, we will use 
a simplified ex-post real-time market model, adopted by PJM, 
ISO New England, and other ISOs IHl, EOl. Our purpose 
is not to include all details; we aim to capture the essential 
features. 

In real-time, in order to monitor and operate the system, the 
control center will calculate the estimated system conditions 
(including bus voltages, branch flows, generation, and demand) 
based on real-time measurements. We call a branch congested 
if the estimated flow is larger than or equal to the security limit. 
The congestion pattern is defined as the set of all congested 
lines, denoted as 6. Notice that we use hat here since the 
congestion pattern is a function of state estimate. Details of 
state estimation and bad data detection are discussed in Section 

nnii 

One important usage of state estimation is calculating the 
real-time LMP. Given the estimated congestion pattern 6, the 
following linear program is solved to find the incremental OPF 
dispatch and associated real-time LMP, A — (Ai) |fT9l : 

minimize J2 ^i^Pi ~ J2 ^'j^^j 
subject to ^ Api = J2 

Ap™° < Ap, < Apf^"" 
Adf " < Adj < Ad™" 

Ak^Api - AkjAdj < 0, for ah fc e 6, 

(1) 

where Ad = (Adj) is the vector of incremental dispatchable 
load, Ap ~ (Api) the vector of incremental generation 
dispatch, c*^ = {c°) and d- = (dj) the corresponding real-time 
marginal cost of generations and dispatchable loads, Ap'™" and 
Ap™"" the lower and upper bounds for incremental dispatch, 
and Aki the sensitivity of branch flow on branch k with respect 
to the power injection at bus i. 

Suppose that we add a virtual demand du at bus u. The 
corresponding Lagrange multiplier of the modified incremental 
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OPF with du is given by 



+ E,(0+(Apf" - Ap,) + 0r(Ap, < Ap™-)) 
+ Ej('/'/(Arff" - Mj) + V7(Adj < Ad™")) 
+ Z^feee Aki^Pi - J2j ^kj^d] - Akudu), 

where the scalars rj , (f>f , <j)~ , ^pj' , -ipj , and jik are correspond- 
ing dual variables of the problem in ([T). 

The real-time LMP at bus u is defined as the overall cost 
increase when one unit of extra load is added at bus u, which 
is the derivative of the Lagrangian multiplier with respect to 
the virtual demand rfu at c?„ = 0, i.e., 



ddu 



du=0 



kee 



AkulJ'k- 



(2) 



Notice that once the congestion pattern 6 is determined, the 
whole incremental OPF problem ([T) no longer depends on the 
measurement data, and thus neither do real-time LMPs. 

Under the DC model, the power system state, x, is defined 
as the vector of voltage phases, except the phase on the 
reference bus. The power flow vector / is a function of the 
system state x, 

I = Fx, (3) 

where F is the sensitivity matrix of branch flows with respect 
to the system state. 

Assume the system has n + 1 buses. Then, x e X = 
[— TT, tt]", where X represents the state space. Any system 
state corresponds to a unique point in X. From (|3]l, the branch 
flow / is determined by the system state x. Comparing the 
flows with the flow limits, we obtain the congestion pattern 
associated with this state. Hence, each point in the state space 
corresponds to a particular congestion pattern. The following 
theorem establishes the relationship between the state space 
and real-time price. 

Theorem 1 (Price Partition of the State Space): The state 
space X is partitioned into a set of poly topes {Xi} where 
the interior of each X; is associated with a unique conges- 
tion pattern and a real-time LMP vector Each boundary 
hyperplane of X; is defined by a single transmission line. 

Proof: For a particular congestion pattern 6 defined by a 
set of congested fines, the set of states that gives 6 is 

X,={x : F,.x > T;™" Vi e e,F^.x < T™" V? ^ 6}, 

where F is the linear matrix factor between state and line 
flows (see |3]l, Fi. the ith row of F, and T^^x ^j^g fj^^ jjj^^jj 
on branch j. 

Since X, is defined by the intersection of a set of half spaces, 
it is a polytope. From previous discussion, each congestion 
pattern is associated with a particular real-time LMP vector A. 
All states with the same congestion pattern share the same real- 
time LMP. Hence each convex polytope X^ in X corresponds 
to a real-time LMP vector ■ 

Theorem [T] characterizes succinctly the relationship between 
the system state and LMP. As illustrated in Fig. |2(a)[ if bad 



data are to alter the LMP in real-time, the size of the bad data 
has to be sufficiently large so that the state estimate at the 
control center is moved to a different price region from the 
true system state. 

On the other hand, if some lines are erroneously removed 
from or added to the correct topology, as illustrated in 
Fig. |2(b)[ it affects the LMP calculation in two ways. First, 
the price partition of the state space changes due to the errors 
in topology information. Secondly, the shift matrix A in ([T]), 
which is a function of topology, changes thereby altering prices 
attached to each price region. 

III. Data Model and State Estimation 

A. Bad Data Model 

1) Meter data: In order to monitor the system, various 
meter measurements are collected in real time, such as power 
injections, branch flows, voltage magnitudes, and phasors, 
denoted by a vector z. If there exists bad data a among the 
measurements, the measurement with bad data, denoted by Za, 
can then be expressed as a function of the system states x. 



z + a = h{x) + w + a, a ^ A, 



(4) 



where w represents the random measurement noise. 

We make a distinction here between the measurement 
noise and bad data; the former accounts for random noise 
independently distributed across all meters whereas the latter 
represents the perturbation caused by bad or malicious data. 
We assume no specific pattern for bad data except that they 
do not happen everywhere. We assume that bad data can only 
happen in a subset of the measurements, S. If the cardinality 
of § is k, the feasible set of bad data a is a fc-dimensional 
subspace, denoted as yi = {a : = for alH ^ S}. 

We will consider three bad data models with increasing 
power of affecting state estimates. 

ML State independent bad data: This type of bad data is 
independent of real-time measurements. Such bad data may 
be the replacement of missing measurements. 

M2. Partially adaptive bad data: This type of bad data may 
arise from the so-called man in the middle (MiM) attack where 
an adversary intercepts the meter data and alter the data based 
on what he has observed. Such bad data can adapt to the 
system operating state. 

M3. Fully adaptive bad data: This is the most powerful 
type of bad data, constructed based on the actual measurement 
z ~ h[x) + w. 

Note that M3 is in general not realistic. Our purpose of 
considering this model is to use it as a conservative proxy to 
obtain performance bounds for the impact of worst case data. 

We assume herein a DC model in which the measurement 
function h{-) in ^ is linear Specifically, 



Za ~ Hx 



a, a IE A, 



(5) 



where H is the measurement matrix. Such a DC model, 
while widely used in the literature, may only be a crude 
approximation of the real power system. By making such a 
simplifying assumption and acknowledging its weaknesses, 
we hope to obtain tractable solutions in searching for worst 
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case scenarios. It is important to note that, although the worst 
case scenarios are derived from the DC model, we carry out 
simulations using the actual nonlinear system model. 

2) Topology data: Topology data are represented by a 
binary vector s G {0,1}', where each entry of s represents 
the state of a line breaker (0 for open and 1 for closed). The 
bad topology data is modeled as 



Sb = s + b (mod 2), 6 e S, 



(6) 



where 23 C {0, 1}' is the set of possible bad data. When 
bad data are present, the topology processor will generate 
the topology estimate corresponding to st, and this incorrect 
topology estimate will be passed to the following operations 
unless detected by the bad data detector. 

B. State Estimation 

We assume that the control center employs the standard 
weighted least squares (WLS) state estimator 



X = argmin(z — /i(a:))^i? ^{x — h{x)), 



(7) 



where R is the covariance matrix of measurement noise w. 
Under the DC model, the WLS estimator is given by 



x = Kz, K = {H"^ R-^ H)-^ H"^ R-^ . 



(8) 



If the noise w is Gaussian, the WLS estimator is also 
the maximum likelihood estimate (MLE) of state x. By the 
invariant property of MLE, from (|3), the maximum likelihood 
estimate of the branch flows is calculated as 



f = Fx = FKz. 



(9) 



The estimated congestion pattern used in real-time LMP 
calculation ([T]i consists of all the estimated branch flows which 
are larger than or equal to the branch flow limits, i.e., 



(10) 



where T™^^ is the flow limit on branch j. 

In the presence of bad meter data a, the meter measurements 
collected by control center is actually Za = Hx + w + a. By 
using Za, the WLS state estimate is 



KZa 



(11) 



where x* ~ Kz is the "correct" state estimate without the 
presence of the bad data {i.e., a — 0). 

(fTTl i shows that the effect of bad data on state estimation 
is linear However, because a is confined in a fc-dimensional 
subspace A, the perturbation on the actual system state is 
limited to a certain direction. 

When bad data exist both in meter and topology data, 
the control center uses a wrong measurement matrix H, 
corresponding to the altered topology data, and the altered 
meter data Za- Then, the WLS state estimate becomes 



Kzn = Kz + Ka, 



(12) 



where K = {H'^R-~^H)~^H'^R-\ Note that unUke the Hnear 
effect of bad meter data, bad topology data affects the state 
estimate by altering the measurement matrix H to H. 



C. Bad Data Detection 

The control center uses bad data detection to minimize 
the impact of bad data. Here, we assume a standard bad 
data detection used in practice, the J(a;)-detector in ||4l. In 
particular, the J(a;)-detector performs the test on the residue 
error, r = z — Hx, based on the state estimate x. From the 
WLS state estimate we have 

r = {I - H{H'^R-^Hy^H'^R-^) z = Uz. (13) 

The J (i) -detector is a threshold detector defined by 



bad data 

r'^R-\ = z^Wz ^ T, 
good data 



where r is the threshold calculated from a prescribed false 
alarm probability. When the measurement data fail to pass the 
bad data test, the control center declares the existence of bad 
data and takes corresponding actions to identify and remove 
the bad data. 

In this paper, we are interested in those cases when bad data 
are present while the J(a;)-detector fails to detect them. 

IV. Impact of Bad Data on LMP 

In this section, we examine the impact of bad data on LMP, 
assuming that the topology estimate of the network is correct. 

A. Relative Price Perturbation 

In order to quantify the effect of bad data on real-time price, 
we need to first define the metric to measure the effect. We 
define the relative price perturbation (RPP) as the expected 
percentage price perturbation caused by bad data. Given that 
LMP varies at different buses, RPP also varies at different 
locations. 

Let Za be the data received at the control center and Xi{za) 
the LMP at bus i. The RPP at bus i is a function of bad data 
a, given by 

Xi{za) - Xi{z) 



RPP, (a) =E 



(14) 



where the expectation is over random measurements 

To measure the system-wide price perturbation, we define 
the average relative price perturbation (ARPP) by 

1 



ARPP(a) = 



^RPP,(a), 



1 ^ 

i 

where n + 1 is the number of buses in the system. 



(15) 



B. Worst RPP under State Independent Bad Data Model 

First, we consider the state independent bad data model 
(Ml) given in Section UlI-AI In this model, the bad data are 
independent of real-time measurements. 

In constructing the state independent worst data, it is useful 
to incorporate prior information about the state. To this end, 
we assume that system state follows a Gaussian distribution 
with mean xq, covariance matrix E^. Typically, we choose xq 
as the day-ahead dispatch since the nominal system state in 
real-time varies around its day-ahead projection. 
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In the presence of bad data a, the expected state estimate 
and branch flow estimate on branch i are given by 



E[5;] ~ xq + Ka. 
E[f,] = F^.E[x] = F^.xo + F.K, 



(16) 
(17) 



(18) 



where Fj. is the corresponding row of branch i in F. 

Our strategy is to make this expected state estimate into 
the region with the largest price perturbation among all the 
possible regions, 6*. From ( fTOl i. this means making all the 
expected branch flows satisfy the boundary condition of C* as 
follows, 

E[.M > TT"" for i e e* 

E[/i] < T™" for j ie*. 

However, due to the uncertainty (from both system state 
X and measurement noise w), the actual estimated state after 
attack, X, may be different from E[x]. Therefore, we want to 
make K[x] at the "center" of the desired price region, i.e., 
maximizing the shortest distance from E[i] to the boundaries 
of the polytope price regions while still holding the boundary 
constraints. The shortest distance can be calculated as 

P = min{/3 : |E[/,] - T™"] > /3 for all i}. (19) 

However, the existence of bad data detector prevents the 
bad data vector a from being arbitrarily large. According to 
(fTsT l. the weighted squared residue with a is 

r'^R-^r ^ (w + a)'^W{w + a). (20) 

Heuristically, since w has zero mean, the term aJWa can be 
used to quantify the effect of data perturbation on estimation 
residue. Then we use aJWa < e to control the detection 
probability in the following optimization. 

Therefore, for a specific congestion pattern 6, the adversary 
will solve the following optimization problem to move the 
state estimate to the "center" of the price region C and keeping 
the detection probability low. 



subject to 



nM - /3 > Tf 

a^Wa < e. 



G e 



(21) 



which is a convex program that can be solved easily in 
practice. We call a region 6 feasible if it makes problem (ISTT i 
feasible. 

The set of all the feasible congestion patterns is denoted as 
r. In order to find the "worst" feasible region with largest price 
perturbation, we need to screen all the possible congestion 
patterns. 

If we use RPP as the index with bus i as the target, the 
worst region is chosen as 



e* = ai-g max|Ai - Ai(e) 
eer 



(22) 



where Xi is the LMP at bus i if the xq is the system state. 
If we use ARPP as the index, the worst region is chosen as 



arg maxT^ 

eer^ 



A. -A,(e) 



A,; 



(23) 



Therefore, the worst case constant bad data vector is the 
solution to optimization problem (ISTT i by setting the congestion 
pattern as C*. 



C. Worst RPP under Partially Adaptive Bad Data 

For bad data model M2, only part of the measurement values 
in real-time are known to the adversary, denoted as Zq. The 
adversary has to first make an estimation of the system state 
from the observation and prior distribution, then make the 
attack decision based on the estimation result. 

Without the presence of bad data vector, i.e., a — 0, the 
system equation ^ gives 



- Wo, 



(24) 



where Ho is the rows of H corresponding to the observed mea- 
surements and Wo the corresponding part in the measurement 
noise w. 

The minimum mean square error (MMSE) estimate of x 
given Zo is given by the conditional mean 

E{x\zo) ^xo + ^,Hl{Ho^,Hl)~\zo - Hoxo). (25) 

Then, the flow estimate on branch i after attack is 



i[h\zo] = F,.E[x\zo]. 



(26) 



Still, we want to move the estimation of state to the 
"center". On the other hand, the expected measurement value 
^[Zal^^o] = i/E[z|zo] + a. We need a pre-designed parameter 
€ to control the detection probability. Therefore, the solution 
to the following optimization problem is the best attack given 
congestion pattern A 



subject to 



{,HE[za\z, 



i e e 

]'')W{HE[za\zo]) < e. 



(27) 



This problem is also a convex optimization problem, which 
can be easily solved. Among all the C's which make the above 
problem feasible, we choose the one with the largest price 
perturbation, denoted as 6*. The solution to problem (|27] i with 
C* as the congestion pattern is the worst bad data vector. 



D. Worst RPP under Fully Adaptive Bad Data 

Finally, we consider the bad data model M3, in which the 
whole set of measurements z is known to the adversary. The 
worst bad data vector depends on the value of z. Different from 
the previous two models, with bad data vector a, the estimated 
state is deterministic without uncertainty. In particular 



Kz + Ka. 



(28) 



And the estimated flow on branch i after attack is also 
deterministic 



f,=F,.x = F,.Kz + F,.Ka. 



(29) 
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Similar to the previous two models, congestion pattern is 
called available if there exists some bad data vector a to make 
the following conditions satisfied; 



{z + ayW{z + a) 



(30) 



Among all the feasible congestion patterns, we choose the 
one with the largest price perturbation, C*. Any bad data vector 
a satisfies condition ( l30b can serve as the worst fully adaptive 
bad data. 

V. Bad Topology Data on LMP 

So far, we have considered bad data in the analog measure- 
ments. In this section, we include the bad topology data, and 
describe another bad data model. 

We represent the network topology by a directed graph S = 
(V, £) where each i G V denotes a bus and each G 
£ denotes a connected transmission line. For each physical 
transmission Une {e.g., a physical line between i and j), we 
assign an arbitrary direction {e.g., {i, j)) for the line, and 
is in £ if and only if bus i and bus j are connected in the power 
network. 

Bad data may appear in both analog measurements and 
digital {e.g., breaker status) data, as described in Section ITlI-AI 



Za = z + a = [Hx + w) + a, 
Sb = s + b (mod 2), 6 e S. 



a e A, 



(31) 



As in Section IIVI we employ the adversary model to 
describe the worst case. The adversary alters s to Sb by adding 
b from the set of feasible attack vectors 23 C {0, 1}' such that 
the topology processor produces the "target" topology S as 
the topology estimate. In addition, the adversary modifies z 
by adding a <E A such that Za looks consistent with S. 

In this section, we focus on the worst case when the 
adversary is able to alter the network topology without chang- 
ing the state estimateQ. We also require that such bad data 
are generated by an adversary causing undetectable topology 
change, i.e., the bad data escape the system bad data detection. 
For the worst case analysis, we will maximize the LMP 
perturbation among the attacks within this specific class. Even 
though this approach is suboptimal, the simulation results in 
Section |Vl] demonstrate that the resulting LMP perturbation is 
much greater than the worst case of the bad meter data. 

Suppose the adversary wants to mislead the control center 
with the target topology S = (V, £), a topology obtained by 
removing a set of transmission lines £a in S {i.e., £ = £ \ 
£a). We assume that the system with 9 is observable: i.e., the 
measurement matrix H corresponding to S has full column 
rankfl 

Suppose that the adversary changes the breaker status such 
that the target topology 9 ~ (V, £) is observed at the control 

* In general, the adversary can design the worst data to affect both the state 
estimate and network topology. It is, however, much more difficult to make 
such attack undetectable, 

t Without observability, the system may not proceed to state estimation and 
real-time pricing. Hence, for the adversary to affect pricing, the system with 
the target topology has to be observable 



2 
4 

(1,3) 
(2,1) 
(2,4) 
(3,2) 
(3,4) 



B21 (^2) +B24 (X2 — X4) — B32 (X3— X2) 

-B24{X2 - X4) -B34 {X3-X4) 

Bl3(-X3) 
B21 (X2) 
B24(X2 - X4) 
B32 (X3-X2) 

B34 {X3-X4) 



2 
4 

(1,3) 
(2,1) 
(2,4) 
(3,2) 
(3,4) 



B2l(X2) + B24 (X2 - X4) 

-B24 (X2 - X4) -B34 (X3-X4 




Fig. 3: Hx and Hx: Each row is marked by the 

corresponding meter {i for injection at i and 
for flow from i to j). 



center. Simultaneously, if the adversary introduces bad data 
a = Hx — Hx, then the meter data received at the control 
center become 



Za = Hx 



w = Hx + w, 



(32) 



which means that the data received at the control center are 
completely consistent with the model generated from 9- Thus, 
any bad data detector will not be effective. 

It is of course not obvious how to produce the bad data 
a, especially when the adversary can only modify a limited 
number of measurements, and it may not have access to 
the entire state vector x. Fortunately, it turns out that a can 
be generated by observing only a few entries in z without 
requiring global information (such as the state vector x) 1211 . 

A key observation is that Hx and Hx differ only in a few 
entries corresponding to the modified topology (lines in £a) 
as illustrated in Fig. [3] Consider first the noiseless case. Let Zij 
denote the entry of z corresponding to the flow measurement 
from i to j. As hinted from Fig. [3] it can be easily seen that 
Hx — Hx has the following sparse structure ETI : 



Hx~Hx 



E 

(ij)6£A 



(33) 



where a.ij G R denotes the line flow from i to j when (i, j) is 
connected and the system state is x, and m( j is the column of 
the measurement-to-branch incidence matrix, that corresponds 
to i.e., is an 7Ti-dimensional vector with 1 at the 

entries corresponding to the flow from i to j and the injection 
at i, and —1 at the entries corresponding to the flow from j 
to i and the injection at j, and at all other entries. Absence 
of noise implies that = 



□fy , which leads to 

Hx — Hx = — Zijm(^ijy 
(ij)e£A 



(34) 
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Unaltered 
measurements 









Attack-modified 
measurements 




TABLE I: Adversary-controlled measurements 





Bus injection 


Line flows (both directions) 
and line breaker states 


IEEE- 14 


2, 5, 7, 9 


(2, 5), (7, 9) 


IEEE- 11 8 


4, 5, 51, 52, 88, 89 


(4, 5), (51, 52), (88, 89) 



Fig. 4: This figure describes how the attack alters the 
local measurements around the line (i, j) in 
£a. 



With ( |34] | in mind, one can see that setting a = Hx — 
Hx and adding a to z is equivalent to the following simple 
procedure: as described in Fig. for each (i, j) in £a, 

1) Subtract and Zji from Zi and Zj respectively. 

2) Set Zij and Zji to be 0. 

where z, is the entry of z corresponding to the injection 
measurement at bus i. 

When measurement noise is present {i.e., z ~ Hx + w), 
the idea of the attack is still the same: to make a approximate 
Hx — Hx so that Za is close to Hx + w. Since Zij = aij +Wij, 
Zij is an unbiased estimate of for each G £a, and 

this implies that — J^a i)£E^ ™ unbiased estimate 



of-E 



(ij)6£A 



'IJ ' 

Hx- 



(■i,i)e£z 



Hx. Hence, we set a to be 
the same as in the noiseless setting, 
and the attack is executed by the same steps as above. 

For launching this attack to modify the topology estimate 
from S to S, the adversary should be able to (i) set h such 
that the topology processor produces S instead of S and (ii) 
observe and modify Zij, Zji, Zi, and Zj for all (i, j) £ £a- 
The attack is feasible if and only if A and H contain the 
corresponding attack vectors. 

To find the worst case LMP perturbation due to unde- 
tectable, state-preserving attacks, let ? denote the set of 
feasible 9s, for which the attack can be launched with A and 
■B. Among the feasible targets in 3^, we consider the best target 
topology that results in the maximum perturbation in real-time 
LMPs. If RPP is used with bus i, the best target is chosen as 

S*[z]=arg max|A,(z;g)-A,(z;g)|, (35) 

where Ai(z; S) denotes the real-time LMP at bus i when the 
attack with the target S is launched on z, and \i{z\ 9) is the 
real-time LMP under no attack. On the other hand, if ARPP 
is used, the best target is chosen as 

A,(z;g)-A,(z;g) 



arg max > 



A,(z;S) 



(36) 



For both RPP and ARPP, the bad data in the worst case are 
the attack vectors for the target g*[z]. 

VI. Numerical Results 

In this section, we demonstrate the impact of bad data 
on real-time LMPs with the numerical simulations on lEEE- 
14 and IEEE-118 systems. We conducted simulations in two 
different settings: the linear model with the DC state estimator 
and the nonlinear model with the AC state estimator The 
former is usually employed in the literature for the ease of 



analysis whereas the latter represents the practical state esti- 
mator used in the real-world power system. In all simulations, 
the meter measurements consist of power injections at all buses 
and power flows (both directions) at all branches. 

A. Linear model with DC state estimation 

We first present the simulation results for the linear model 
with the DC state estimator We modeled bus voltage mag- 
nitudes and phases as Gaussian random variables with the 
means equal to the day-ahead dispatched values and small 
standard deviations. In each Monte Carlo run, we generated 
a state realization from the statistical model, and the meter 
measurements were created by the DC model with Gaussian 
measurement noise. Once the measurements were created, bad 
data were added in the manners discussed in Section |IV] and 
Section [V] With the corrupt measurements, the control center 
executed the DC state estimation and the bad data test with the 
false alarm constraint O.L If the data passed the bad data test, 
real-time LMPs were evaluated based on the state estimation 
results. 

For IEEE- 14 and IEEE-118 system, the network param- 
eter^ are available in li22l . Table |T] contains the list of 
measurements to which the adversary was assumed to be able 
to add bad data. In the partially adaptive bad data case, the 
adversary was assumed to observe measurements from a half 
of meters. 

Fig. |5] and Fig. |6] are the plots of RPPs and ARPP^ versus 
detection probabilities of bad data. They show that even when 
bad data were detected with low probability, RPPs and ARPPs 
were large, especially for the fully adaptive bad meter data and 
the bad topology data. 

Comparing RPPs and ARPPs of the three bad meter data 
models, we observe that the adversary may significantly im- 
prove the perturbation amount by exploiting partial or all real- 
time meter data. It is worthy to point out that bad topology 

t In addition to the network parameters given in 1221 , we used the following 
line limit and real-time offer parameters. In the IEEE-14 simulation, the 
generators at the buses 1, 2, 3, 6, and 8 had capacities 330, 140, 100, 100, 
and 100 MW and the real-time offers 15, 31, 30, 10, and 20 $/MW. Lines (2, 
3), (4, 5), and (6, 11) had line capacities 50, 50, and 20 MW, and other lines 
had no line limit. In the IEEE- 118 simulation, the generators had generation 
costs arbitraiily selected from {20,25,30,35,40 $/MW} and generation 
capacities arbitrarily selected from {200, 250, 300, 350, 400 MW}. Total 16 
lines had the line capacities arbitraiily selected from {70, 90, 110 MW}, and 
other lines had no line hmit. 

§ For RPPs and ARPPs in the bad topology data cases, we took the average 
over RPPs and ARPPs that are less than 200% to exclude the cases of price 
spikes, in which drastic changes occur in real-time LMPs. With the price 
spikes included, RPPs and ARPPs for the bad topology data become even 
larger than what we present here. 

In addition, the detection probabilities for the fully adaptive bad meter data 
and the bad topology data cases were less than 0.1 in all the simulations. In 
the figures, we draw RPPs and ARPPs of those cases as horizontal lines so 
that we can compare them with other cases. 
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0.15 0.2 0.25 0.3 0.35 0.4 

Detection Probability 
(a) IEEE- 14: RPP for bus 14 




0.15 0.2 0.25 0.3 0.35 0.4 

Detection Probability 
(b) IEEE- 118: RPP for bus 7 



Fig. 5; Linear model: RPP vs detection prob. 
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(b) IEEE-118: ARPP of the worst topology data 
is 59.8%. 



Fig. 6: Linear model: ARPP vs detection prob. 

data result in much greater price perturbation than bad meter 
data. 

Recall the discussion in Section that bad topology 
data and bad meter data employ different price-perturbing 
mechanisms: bad topology data perturb real-time LMPs by 
restructuring the price regions (without perturbing the state 
estimate) whereas bad meter data perturb real-time LMPs by 
moving the state estimate to a different price region (without 
changing the price regions). Therefore, the observation implies 
that restructuring the price regions has much greater impact 
on real-time LMPs than merely perturbing the state estimate. 

B. Nonlinear model with AC state estimation 

The simulations with the nonlinear model intend to inves- 
tigate the vulnerability of the real-world power system to the 
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Detection Probability 

(a) IEEE- 14: RPP for bus 14. RPP of the worst 
topology data case is 23.9%. 




0.15 0.2 0.25 0.3 

Detection Probability 

(b) IEEE-118: RPP for bus 7. RPP of the worst 
topology data case is is 30.0%. 

Fig. 7: Nonlinear model: RPP vs detection prob. 



worst adversarial act, designed based on the linear model. 
The simulations were conducted with IEEE- 14 and IEEE-118 
systems in the same manner as the linear case except that we 
employed the nonlinear model and the AC state estimation. 

Fig. |7] and Fig. [8] are the plots of RPPs and ARPPs versus 
detection probabilities. Compared to the linear case results, 
RPPs and ARPPs of bad meter data were much lower whereas 
RPPs and ARPPs of bad topology data were still significant. 
Recall that the bad meter data were designed based on the 
linear model to move the state estimate to the worst price 
region. Small price perturbations by bad meter data imply 
that the state estimate of the AC state estimation was not 
moved as the adversary intended. On the other hand, large 
price perturbations by bad topology data imply that the real- 
time LMPs in the real-world power systems may experience 
large errors under the presence of maliciously designed, bad 
topology data. 

VII. Conclusion 

We report in this paper a study on impacts of the worst 
data on the real-time market operation. A key result of this 
paper is the geometric characterization of real-time LMPs 
given in Theorem [1] This result provides the insights into the 
relation between data and LMPs and serves as the basis of 
characterizing impacts of the worst data. 

Our investigation on the effect of the worst data includes bad 
data scenarios that arise from both analog meter measurements 
and digital breaker state data. To this end, we have developed 
a systematic approach to finding the worst data by casting 
the problem as one involving an adversary injecting malicious 
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is 65.6%. 



Fig. 8: Nonlinear model: ARPP vs detection prob. 



data. While such an approach often gives overly conservative 
analysis, it does provide important assurance when the impacts 
of the worst data are deemed acceptable. On the other hand, 
because we use adversary attacks as a way to study the worst 
data, our results have direct impUcations when cyber-security 
of smart grid is considered. 

Although our findings are obtained from academic bench- 
marks involving relatively small size networks, we believe that 
the general trend that characterizes the effects of bad data is 
Ukely to persist in practical networks of much larger size. In 
particular, as the network size increases and the number of 
simultaneous appearance of bad data is limited, the effects of 
the worst meter data on LMP decrease whereas the effects of 
the worst topology data stay nonnegligible regardless of the 
network size. This observation suggests that the bad topology 
data are potentially more detrimental to the real-time market 
operation than the bad meter data. 
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